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Abstract 

The random graph model has recently been extended to a random preferential attachment 
graph model, in order to enable the study of general asymptotic properties in network types 
that are better represented by the preferential attachment evolution model than by the ordinary 
(uniform) evolution lodel. Analogously, this paper extends the random hypergraph model to a 
random preferential attachment hypergraph model. We then analyze the degree distribution 
of random preferential attachment hypergraphs and show that they possess heavy tail degree 
distribution properties similar to those of random preferential attachment graphs. However, our 
results show that the exponent of the degree distribution is sensitive to whether one considers 
the structure as a hypergraph or as a graph. 
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1 Introduction 


Random structures have proved to be an extremely useful concept in many disciplines, including 
mathematics, physics, economics and communication systems. Examining the typical behavior of 
random instances of a structure allows us to understand its fundamental properties. The founda¬ 
tions of random graph theory were first laid in a seminal paper by Erdos and Renyi in the late 
1950’s [7j. Subsequently, several alternative models for random structures, often suitable for other 
applications, were suggested. One of the most important alternative models is the preferential 
attachment model [2], which was found particularly suitable for describing a variety of phenomena 
in nature, such as the “rich get richer” phenomena, which cannot be adequately simulated by the 
original Erdos-Renyi model. It has been shown that the preferential attachment model captures 
some universal properties of real world social networks and complex systems, like heavy tail degree 
distribution and the “small world” phenomenon [12]. 

One limitation of graphs is that they only capture dyadic (or binary) relations. In real life, how¬ 
ever, many natural, physical and social phenomena involve fc-ry relations for k > 2, and therefore 
can be more accurately represented by hypergraphs than by graphs. For example, collaborations 
among researchers, as manifested through joint coauthorships of scientific papers, may be bet¬ 
ter represented by hyperedges and not edges. Figure Qa) depicts the hypergraph representation 
for coauthorship relations on four papers: paper 1 authored by {a, b, e,/}, paper 2 authored by 
{a,c,d,g}, paper 3 authored by {b, c, d} and paper 4 authored by {e,/}. Likewise, wireless com¬ 
munication networks [1] or social relations captured by photos that appear in Facebook and other 
social media also form hyperedges p3|. Affiliation models mm, which are a popular model for 
social networks, are commonly interpreted as bipartite graphs, where in fact they may sometimes 
be represented more conveniently as hypergraphs. Figure [ljb) presents the bipartite graph repre¬ 
sentation of the hypergraph H of Figure Qa). Sometimes, one can only access the observed graph 
G{H ) of the original hypergraph H, that is, only the pairwise relation between players is available 
(see Figure [ljc)). In some cases this structure may be sufficient for the application at hand, but in 
many other cases the hypergraph structure is more accurate and informative/ 

The study of hypergraphs, and in particular random hypergraph models, has its roots in a 1976 
paper by Erdos and Bollobas [3|, which offers a model analogous to the Erdos-Renyi random graph 
model [7]. Recently, several interesting properties regarding the evolution of random hypergraphs 
in this model were studied in EHIB]. 

The current paper is motivated by the observation that, just as in the random graph case, the 
random hypergraph model is not suitable for studying social networks. Our first contribution is in 
extending the concept of random preferential attachment graphs to random preferential attachment 
hypergraphs. We believe the this natural model will turn out to be useful in the future study of 
social networks and other complex systems. 

The main technical contribution is that we analyze the degree distribution of random preferen¬ 
tial attachment hypergraphs and show that they possess heavy tail degree distribution properties, 
similar to those of random preferential attachment graphs. However, our results show that the ex¬ 
ponent of the degree distribution is sensitive to whether one considers the structure as a hypergraph 
or as a graph. 

As a reference point, we consider the random preferential attachment graph model of Chung 
and Lu [3]. In that model, starting from an initial graph Go, at any time step there occurs an 
event of one of two possible types: (1) a vertex-arrival event, occuring with probability p, where a 
new vertex joins the network and selects its neighbor among the existing vertices via preferential 
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Figure 1: (a) A hypergraph H with 7 vertices and 4 edges, (b) A bipartite graph representation 

of H. (c) The observed graph G(H). 


attachment, or (2) an edge-arrival event, occuring with probability 1 —p, where a new edge joins the 
network and selects its two endpoints from among the existing vertices via preferential attachment. 
It is shown in [3] that the degree distribution of the random preferential attachment graph follows 
a power law, i.e., the probability of a random vertex to be of degree k is proportional to , with 

= 2 + 2 ^p ■ A similar result can be shown in a setting where, at each time step, d edges join the 
graph instead of only one (in either a vertex event or an edge event) |12j. This result holds even if 
at each step a random number of edges join the network, so long as the expected number of new 
edges is d and the variance is bounded. 

The model proposed here extends Chung and Lu’s [3] model to support hypergrpahs. That is, 
the process starts with an initial hypergrpah, and at each time step a random hyperedge joins the 
network. With probabilty p this new random hyperedge includes a new vertex, and with probabilty 

1 — p it does not. Our model allows the hyperedge sizes to be random (with some restrictions) and 
the members of each edge are selected randomly according to preferential attachment. 

We show that the degree distribution of the resulting hypergraph (as well as the observed 
graph) follows a power law, but with an exponent (3 H = 2 + , where p is the expected size of 

an hyperedge. 

Our results indicate that one should be careful when studying an observed graph of a general 
k- ry relation. In particular, it makes a difference if the observed graph was generated by a graph 
or by a hypergraph evolution mechanism, since the two generate observed graphs with different 
degree distributions. 

In the next sections we describe in more detail the preferential attachment model of a hyper¬ 
graph, and then analyze the resulting degree distribution. 

2 Preliminaries 

Given a set V and a natural k > 1, let V ^ be the set of all unordered vectors (or multisets) of 
k elements from V. A finite undirected graph G is an ordered pair (V, E) where V is a set of n 
vertices and E C V ^ is the set of graph edges (unordered pairs from V, including self loops). 

A hypergraph H is an ordered pair (V, £), where V is a set of n vertices and £ C is a 

set of hyperedges connecting the vertices (including self loops). The rank r[ffL) of a hypergraph % 
is the maximum cardinality of any of the hyperedges in the hypergraph. When all hyperedges have 
the same cardinality k, the hypergraph is said to be k-uniform. A graph is thus simply a 2-uniform 
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hypergraph. The degree of a hyperedge e E £ is defined to be 5(e) = |e|. The set of all hyperedges 
that contain the vertex v is denoted £(v) = {e G £ \ v G e}. The degree d(v) of a vertex v is the 
number of hyperedges in £(v), i.e., d(v) = |£(u)|. % is d-regular if every vertex has degree d. 

In the classical preferential attachment graph model |2], the evolution process starts with an 
arbitrary finite initial network Go, which is usually set to a single vertex with a self loop. Then this 
initial network evolves in time, with Gt denoting the network after time step t. In every time step 
t a new vertex v enters the network. On arrival, the vertex v attaches itself to an existing vertex 
u chosen at random with probability proportional to it’s degree at time t, i.e., 

P [u is chosen] = — — —- , 

E w eG t d tH 

where dt(x) is the degree of vertex x at time t. 

3 The nonuniform preferential attachment hypergraph model 

Similar to the classical preferential attachment graph model [3], the evolution of the hypergraph 
occurs along a discrete time axis, with one event occurring at each time step. We consider two types 
of possible events on the hypergraph at time t: (1) a vertex arrival event, which involves adding a 
new vertex along with a new hyperedge, and a hyperedge arrival event, where a new hyperedge is 
added. 

We consider a nonuniform, random hypergraph where self loops (i.e., multiple appearance of a 
vertex in a hyperedge) are allowed. We consider self loops as contributing 1 to the vertex degree. 
Similar to J3j, our preferential attachment model, H(Hq,p,Y), has three parameters: 

• A probability 0 < p < 1 for vertex arrival events. 

• An initial hypergraph Hq given at time 0. 

• A sequence of random independent integer variables Y = (Yq, Yf, Y 2 ,...), for Y) > 2, which 
determine the cardinality of the new hyperedge arriving at time t. 

The process by which the random hypergraph H(Ho,p, Y) grows in time is as follows. 

• We start with the initial hypergraph Ho at time 0. 

• At time t > 0, the graph Ht is formed from Ht- 1 in the following way: 

— Randomly draw a bit b with probability p for 6 = 0. 

— If 6 = 0, then add a new vertex u to V, select Y t — 1 vertices from H t - 1 (possibly 
with repetitions) independently in proportion to their degrees in Ht-i, and form a new 
hyperedge e that includes u and the Y t — 1 selected vertices^ 

— Else, select Yj vertices from Ht -1 (possibly with repetitions) independently in proportion 
to their degrees in Ht- 1 , and form a new hyperedge e that includes the Y) selected 
vertices. 

Hereafter, we consider an initial Ho consisting of a single hyperedge of cardinality Yo over a 
single vertex (recall that self-loops are considered as contributing 1 to the vertex degree). 

*note that as the hypergraph gets larger, the probability of adding a self-loop is vanishing. 
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4 Degree Distribution Analysis 


To ensure convergence of the degree distribution we first need to set some conditions on the dis¬ 
tribution of the hyperedge cardinalities. These are somewhat mild conditions that seems to agree 
with real data (see Fig. [4] in Section [5]). Let Yt be independent (not necessarily identical) random 
variables with constant expectation E[Y)] = p and bounded support s.t. 2 < Yt < ts (jj Under 
these conditions we can show the following. 


Theorem 4.1. The degree distribution of a hypergraph H{H$,p,Y) where E[Y)] = p follows a 
power law with (3 = 2 + p/{p — p). 


Proof. We start with properties of Y t . Let St = b, so E [St] 
of St from its expected value can be bounded. 


Lemma 4.2. 


I St - E[S*]|) > *1 


< 0(l/f 4 ). 


pt and St < t s. The deviation 


Proof. By Hoeffding’s inequality [9], assuming the random variable Yt satisfies P [Yi G [a*, bi]] = 1 
for some reals a, and bi, 


P[|5t -E[5 t ]|) > x] < 2 exp (— , 2x -— ] . 

V z2i=\( b i ~ a P ) 

Taking x = t.3y/2\ogt and noting that (bi — at) 2 < t't and Y^t=i(bi ~ a % ) 2 < yields the result. □ 

To bound the degree distribution of a non-uniform random hypergraph we closely follow Chung 
and Lu’s analysis on preferential attachment graphs [I]. Let rrik,t denote the number of vertices of 
degree k at time t. Note that m\p = 0 and mo,t = 0. We derive the recurrence formula for the 
expected value E [m^]. The main observation here is that a vertex has degree k at time t if either 
it had degree k at time t— 1 and was not selected into a hyperedge at time t, or it had degree k— 1 
at time t — 1 and was selected into a hyperedge at time t. Letting Ft be the cr-algebra associated 
with the probability space at time t, we have for any t > 0 and k > 0: 


E[m k ,t\F t -i} = m k ,t -1 I pEy t 

^pE Yt 

+ (1 -p)E Yt 


1 - 


Si_i 


Y t -1 


+ (1 -p)Ey t 


1 - 


Si-1 


Y t 


1-il¬ 


k-1 


S t 


Y t -1 


1 - 1 - 


k-1 


Y t ' 


^The exponent | is chosen somewhat arbitrarily; the result can be extended to any constant 0 < a < \. 
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hence 


^[rrik^Ft-i] = m k ,t-i \p-E Yt 


(Y t - 1 )kp 


St 


- O Ey t 


m 


+ 1 - p — E Yt 


(1 -p)Y t k 

St 


+m k -i,t-i ^Ey t 

+Ey t 


(Y t - 1 )p(k - 1) 

St 

(l-p)Y t (k-l) 

St 


- O (^Ey t 
- O f Ey t 

— O ( Ey t 


(l-p)Y t k \‘ 

St ) 


m 


(1 — p)k 

St 


or 


St ' " \S 2 

Using the bound on St we can find the expectation E [m k ,t 


EK,W = rnu-1 (l - + o (I)) +mu M -l + O ^ ~ ^ 


5* 


5? 


E[m fci t] = (1 - 1/t ) ^E[m fc , t -i] ^1 - 




(p-p)k 


+°(% 

pt ± 1 3 \J 2 logt V t 


r / (p — p)(k — 1) /A: 2 

+E[mfe_i ; t_i] -g + O ( -y 

\pt± t a a/ 2 logt V t 


+ - • t 4/3 
+ f 4 J 


= E[m fc> t-i] 1 - 


(p-p)k 


k 2 

2 ,_+ OI -2 

pt ± t 3 "y/2 log t 


+EK-M-1] ( 0‘-rt ( *^ + o (^) ] +0(1/^). 

\pt±t 3 -y/2 logt V t 


For t > 0 and the special case of k = 1 we have 


( 1 - ^ + O(^) ) +p , 


thus 

E[mi,t] = E[mi,j_i] ( 1- ^ P) k + O(^) ) + P + 0(lA 2 ) ■ 

\ pt ± fa-y/2 logt ^ y 

We use the following lemma of [3]. 

Lemma 4.3. jy]/ Let (at), (bt), (ct) be three sequences such that at+ 1 = (l — y) at + ct, lim^oo bt = 
b > 0 and lim^jx, c* = c. Then lim t ->oo{at/t) exists and equals c/( 1 + 6). 

We show by induction that lim^oo M[m k j]/t exists and has a limit M k for each k. For k = 1, 
apply Lemma |4.3| with 


b t = - n P -h 0(k 2 /t) and c t =p + 0(l/t 2 ) 

p ± f 3 a/ 2 log t/t 
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and hence 


to get 


lim bt = --— and lim ct = p, 

t—>oo fl t—> oo 


Mi = lim 




pp 


t —^oo t 


2p-p 


We now assume that limt^oo^[ m k-i,t\/t exists and apply Lemma 4.3 again with 

{p-p)k 


and 


Then 


c t = 


H ± t3y / 2logt/t 


E[m k -i tt -i] ( (p-p)(k-l) (k 2 




± t$y/2\ogt/t \ * 


+ o(-l \+°\p 


(fl _ 

lim bt = b = - and lim q = c = M k _i(p — p)(k — 1)/p , 

t—> OO fl t—¥ OO 


and by Lemma 4.3 we get that limt^. 0 o^['mk,t]/t exists and satisfies 

nr (h ~ p)(k — 1) ,, (k — 1) 

Mk = ^fe-l 77, , -TV77T = 

' fl —p 


>(l + fc(jU-p)//i) 


Recall that a power law distribution has the property that M k oc k 3 for large k. 
Now if M k oc then 

0 


M k 


k~P 


By Eq. Q, 


M k _ x (k - l)~P 
M k k - 1 


1 


M k -1 k + J^P 


= 1 - 


L 


1 + 

M-P 

k+^~ 

H~p 


P 


= 1-T = l-T+O To 


Jfe 


1 


fc 2 


= 1 - 


1 + 


JhzL + 0 
k \k 2 


so the exponent fd of the power law satisfies 


/3 = 1 H-—— = 2 + P 


T ~P 


p-p 


(1) 


□ 


A special case of H(Hq,p, Yt ) is when Yt is the constant function d and the hypergraph becomes 
a d -uniform hypergraph denoted as H(Ho,p, d). 

Corollary 4.4. The degree distribution of a d-uniform hypergraph H(Ho,p,d) follows a power law 
with (3 = 2 + p/(d — p). 

Figure [2] illustrates the difference in exponents /? between preferential attachment graphs (i.e., 
2-uniform hypergraphs) and 3-uniform hypergraphs as a function of p. 

In many cases one can only observe the graph G[H] that results of the underlying hypergrph H. 
That is, the set of vertices of G(H ) is identical to the set of vertices of H and for every hyperedge 
e £ H we create edges in G(H ) to form a clique between all the vertices in e. Now we can prove 
the following. 
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Claim 4.5. The degree distribution of the observed graph G(H(Ho,p,d)) that results from a d- 
uniform hypergraph follows a power law with f3 = 2 + p/(d — p). 



Figure 2: The exponent /3 of a preferential attachment graph and a 3-uniform hypergraph as a 
function of p (the probability of an edge arrival event). In graphs it is between 2 and 3, whereas in 
3-hypergraphs it is between 2 and 2.5. 

Note that the expected degree of vertices in G(H ) in this case is d(d — l)/2. Interestingly, 
if we generate a new graph G' with expected degree d(d — l)/2 according to the classical graph 
preferential attachment model, then its degree distribution will be /3' = 2 +p/(2 — p). Hence the 
observed degree distribution of G(H) and G', f3 and ft respectively, will be different. On the other 
hand, it we generate G' (using the classical preferential attachment model) so that it agrees with the 
degree distribution of G, then the average degree will be different. This observation is supported 
by simulation results depicted in Figure [3j 

This discussion seems to indicate that, in some sense, “the blanket (i.e., of the model) is too 
short” and one should be careful in deciding what is the right model that captures the observed 
degree distribution, and in particular, if the generative model is of a hypergraph or the classical 
graph model. 

5 Example 

To test the above observations empirically, we studied a coauthorship hypergraph of researchers 
in computer science, extracted from DBLP HB, a dataset recording most of the publications in 
computer science. This hypergraph consists of hundreds of thousands of vertices (representing 
authors) and hyperedges (representing papers). Figure [4] shows the degree distribution of hyperedge 
sizes in DBLP for hyperedges sizes at least 3. The hyperedge size distribution closely fits a power 
law degree distribution with exponent j3 = 4.66. This means that the hyperedge size is (with high 
probabilty) smaller than m 1 / 3 , where rri is the number of papers (hyperedges). For the example of 




Figure 3: Example of the cumulative degree distribution of three networks with n = 10,000: (1) 
A graph G(H(*, 1,3)) derived from a 3-uniform hypergraph H(*, 1,3), (2) A graph G(H (*, ^,3)) 
derived from a 3-uniform hypergraph 3), and (3) A preferential attachment graph with 

average degree d(d — l)/2 = 3. Graphs derived from hypergraphs have lower exponent, also as a 
function of p. 

DBLP, where the number of papers is m = 2420879, the number of authors on a paper (i.e., the 
hyper-edge size) will be with high probability below 134. 
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Figure 4: The degree distribution of hyperedge sizes in DBLP for hyperedge sizes at least 3. The 
distribution closely fits a power law degree distribution with exponent (5 = 4.66. 
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